# Document Image Analysis
Qwen2.5 VL 7B Instruct Quantized.w4a16
Apache-2.0
Quantized version of Qwen2.5-VL-7B-Instruct, supporting vision-text input and text output, with weights quantized to INT4 and activations to FP16.
Text-to-Image
Transformers English

Q
RedHatAI
605
3
Paligemma2 3b Ft Docci 448
PaliGemma 2 is an upgraded vision-language model released by Google, combining the capabilities of Gemma 2 and SigLIP vision models, supporting multilingual vision-language tasks.
Image-to-Text
Transformers

P
google
8,765
12
Sd3 Long Captioner V2
Apache-2.0
A fine-tuned image-to-text generation model based on PaliGemma 224x224 version, specializing in generating detailed descriptions for artistic images
Image-to-Text
Transformers Supports Multiple Languages

S
gokaygokay
135
25
Featured Recommended AI Models